Summary¶

This notebook pulls(if it doesn't already exist) a nifti dataset from OpenNEURO, it then processes the nifti data with nibabel, and finally display the data as a set of images. Included at the end of the demo is a few libraries and ideas for extending this notebook.

We use a few external libraries (documentation for each is in the link):

  • OpenNEURO
  • nibabel
  • matplotlib
  • numpy
  • scipy
In [ ]:
# This block checks if the openneuro package is installed. If it's not it will try to install the requirements file.

import subprocess
import sys
import os
try:
    import openneuro
except:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-r",f"{os.getcwd()}/../requirements.txt"])
    pass

Pulling and displaying MRI data from Openneuro¶

Importing libraries¶

Here's a short description of the libraries this notebook uses and what each library does for us.

OpenNEURO¶

Openneuro gives us api access to the openneuro data.

Numpy¶

Gives us access to fast array manipulation tools.

Scipy¶

Scipy is a general scientific computing library we're particularly interested in ndimage the N Dimensional IMAGE module.

Matplotlib¶

Matplotlib is a general plotting library this is what will ultimately render our images.

Nibabel¶

Nibabel is a nero-imaging data reader. It will open and read our nifti data from our *.nii.gz files.

In [1]:
# External
import openneuro as on
import numpy as np
import scipy.ndimage as ndi
import matplotlib.pyplot as plt
import nibabel as nib
# "Python"
import logging
from pathlib import Path
from IPython.display import display, Markdown, Latex

We now tell python where to put/find our datasets. We define two things here:

  • DATA_PATH: A string describing the path (unix style) relative to the working directory (the directory the notebook is in). Defined in all caps since I'm considering it a constant.
  • data_path: A Path() python object, this lets us live in an "OS agnostic world". This is defined in lower case since it's a variable defined at interpretation time.
Joe's Choices

I decided to put the data into the same folder as the notebook. There's no particular reason to do that, in fact, if you're writing your own project it's probably a bad choice. A better choice would be the root of the repo.

In [2]:
DATA_PATH = "./data"
data_dir = Path(DATA_PATH)

Pull down the data¶

Next we check if the data is already present. We first check if the parent directory exists, if it doesn't we enforce it's existance.

We then check if the list of all .nii.gz files in our data directory is empty. If the list is empty it means there is no nifti data and we should pull down a dataset from OpenNEURO.

Joe's Choices

I picked the Working Memory and Reward in Children with and without Attention Deficit Hyperactivity Disorder (ADHD) dataset. I picked this dataset for no particular reason, this code should "just work" with other datasets. If you look on the OpenNEURO website you can see: I don't want to download all 40GB of that data so I use the filtering mechinism in OpenNeuro_py to get only the first subject's files (the part that says include="sub-01/*").

Exercise:

Pick an alternate MRI dataset from OpenNEURO and modify the following markdown and python block to document and pull down that dataset.

Dataset: Working Memory and Reward in Children with and without Attention Deficit Hyperactivity Disorder (ADHD)¶

Dataset ID: ds002424¶

Subject ID: sub-01¶

In [3]:
if not data_dir.exists():
    data_dir.mkdir(parents=True)

if not next(data_dir.glob("**/*.nii.gz"), None):
    on.download(dataset="ds002424", target_dir=data_dir, include="sub-01/*")
    ...
👋 Hello! This is openneuro-py 2024.2.0. Great to see you! 🤗

   👉 Please report problems 🤯 and bugs 🪲 at
      https://github.com/hoechenberger/openneuro-py/issues

🌍 Preparing to download ds002424 …
📁 Traversing directories for ds002424 : 0 entities [00:00, ? entities/s]
📥 Retrieving up to 14 files (5 concurrent downloads). 
✅ Finished downloading ds002424.
 
🧠 Please enjoy your brains.
 
participants.json:   0%|          | 0.00/3.30k [00:00<?, ?B/s]
CHANGES:   0%|          | 0.00/291 [00:00<?, ?B/s]
README:   0%|          | 0.00/506 [00:00<?, ?B/s]
participants.tsv:   0%|          | 0.00/14.2k [00:00<?, ?B/s]
dataset_description.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]
sub-01_ses-T1_task-SLI_bold.nii.gz:   0%|          | 0.00/75.3M [00:00<?, ?B/s]
sub-01_ses-T1_task-SLD_events.tsv:   0%|          | 0.00/4.88k [00:00<?, ?B/s]
sub-01_ses-T1_T1w.nii.gz:   0%|          | 0.00/8.44M [00:00<?, ?B/s]
sub-01_ses-T1_task-SLI_events.tsv:   0%|          | 0.00/4.90k [00:00<?, ?B/s]
sub-01_ses-T1_task-SLD_bold.nii.gz:   0%|          | 0.00/77.1M [00:00<?, ?B/s]
sub-01_ses-T1_task-SSI_events.tsv:   0%|          | 0.00/4.91k [00:00<?, ?B/s]
sub-01_ses-T1_task-SSD_bold.nii.gz:   0%|          | 0.00/75.3M [00:00<?, ?B/s]
sub-01_ses-T1_task-SSI_bold.nii.gz:   0%|          | 0.00/75.2M [00:00<?, ?B/s]
sub-01_ses-T1_task-SSD_events.tsv:   0%|          | 0.00/4.90k [00:00<?, ?B/s]

Some Support Functions¶

The next section defines some support functions for processing the data.

We need to satisfy at least $6$ Use Cases while processing the data:

  • read
  • preprocess
  • process
  • display
  • postprocess
  • store
Question: Some of these use cases are more difficult to satisfy then others. What is the most challenging? Is the difficulty ranking dependant on project goals?

Read¶

The read function loads the nifti data from the fname file.

Exercise:

Update this and the following python block as needed for your dataset.

Joe's Choices

Thanks James for implementing this. Only changes I made were to change the prints to logs.

In [5]:
def read_data(fname):
    logging.info("== Opening file with Nibabel ==")
    img = nib.load(fname)
    logging.info("== Data image from %s has this shape ==" % fname)
    logging.info(img.shape)
    img.get_data_dtype() == np.dtype(np.int16)  # query datatype

    # turn raw data into numbers
    data = img.get_fdata()
    logging.info("== Numpy array from %s has this shape ==" % fname)
    logging.info(data.shape)

    # and get a header (for metadata)
    hdr = img.header
    # hdr.get_xyzt_units()
    logging.info("== data header ==")
    logging.info(hdr.structarr)

    return img, data, hdr

Preprocess¶

The preprocessor function takes the data(numpy array) and "removes" the time component, if one exists.

Exercise:

Update this and the following python block as needed for your dataset.

Joe's Choices

Again I don't really have enough experience with working with nifti data to know how to do this "properly". I think this works.

In [ ]:
def preprocess_data(data):
    if len(data.shape) == 4:
        data = data[:, :, :, 0]
    return data

Process¶

The processor function takes the data(numpy array) and ...

Exercise:

Update this and the following python block as needed for your dataset.

Joe's Choices

I'm not doing any real data processing. If I was this is where I would put the code that computes on the data.

In [ ]:
def process_data(data):
    return data

Display¶

The display function takes the data(numpy array) and displays three views of the MRI slice samples over each of the spacial coordinates.

The function has optional arguments for number of rows and columns, these change the number of slices to sample from the data.

Exercise:

Update this and the following python block as needed for your dataset.

Joe's Choices

So I don't really know what I'm doing here. I found some template code and made some minor changes. Probably a good idea to look in more detail.

In [6]:
def display_data(data, fig_rows:int=4, fig_cols:int=4, name:str="MRI"):


    n_subplots = fig_rows * fig_cols
    n_slice = data.shape[0]
    step_size = n_slice // n_subplots
    plot_range = n_subplots * step_size
    start_stop = int((n_slice - plot_range) / 2)

    fig, axs = plt.subplots(fig_rows, fig_cols, figsize=[10, 10])

    for idx, img in enumerate(range(start_stop, plot_range, step_size)):
        axs.flat[idx].imshow(ndi.rotate(data[img, :, :], 90), cmap='bone')
        axs.flat[idx].axis('off')

    plt.tight_layout()
    display(Markdown(f'## MRI slices in x of {name}'))
    plt.show()

    n_slice = data.shape[1]
    step_size = n_slice // n_subplots
    plot_range = n_subplots * step_size
    start_stop = int((n_slice - plot_range) / 2)

    fig, axs = plt.subplots(fig_rows, fig_cols, figsize=[10, 10])

    for idx, img in enumerate(range(start_stop, plot_range, step_size)):
        axs.flat[idx].imshow(ndi.rotate(data[:, img, :], 90), cmap='bone')
        axs.flat[idx].axis('off')

    plt.tight_layout()
    display(Markdown(f'## MRI slices in y of {name}'))
    plt.show()

    n_slice = data.shape[2]
    step_size = n_slice // n_subplots
    plot_range = n_subplots * step_size
    start_stop = int((n_slice - plot_range) / 2)

    fig, axs = plt.subplots(fig_rows, fig_cols, figsize=[10, 10])

    for idx, img in enumerate(range(start_stop, plot_range, step_size)):
        axs.flat[idx].imshow(ndi.rotate(data[:, :, img], 90), cmap='bone')
        axs.flat[idx].axis('off')

    plt.tight_layout()
    display(Markdown(f'## MRI slices in z of {name}'))
    plt.show()

Postprocess¶

The Postprocessor function takes the data(numpy array) and ...

Exercise:

Update this and the following python block as needed for your dataset.

Joe's Choices

I don't plan on saving the data. If I was this is where I would implement the "clean up" code to format the data for storage.

In [ ]:
def postprocess_data(data):
    return data

Store¶

The store function takes the data(numpy array) and ...

Exercise:

Update this and the following python block as needed for your dataset.

Joe's Choices

I'm not doing any data storage. If I was this is where I would put things like compressing the data and writing to disk.

In [ ]:
def store_data(data) -> bool:
    return True

Program Code¶

We now get to the meat and potatoes of the program. This is where the main set of instructions goes. Here I've implemented the following state machine.

mermaid
stateDiagram-v2 
    First: foreach nifti file
    state First {
    [*] --> read
    read-->preprocess
    preprocess-->Process
    Process-->Display
    Process --> Postprocess
    Postprocess-->store
    store --> [*]
     }
Exercise:

The ultimate goal here is to hit one of these datasets with mapper. To do that you'll need to:

  1. Add some data preprocessing code
  2. Add processing code
  3. Add a mapper implementation
  4. Add a display function for mapper output

Suggestion from James:
"A very quick suggestion is to simply threshold the raw data and only keep a very sparse subset (i.e. 1% of the raw brain data). That should be enough to get people started . We can then punt the challenge of scaling up to later."

I think this is absolutely the right first steps. Who wants to to be the next person to take a stab at filling out this notebook for Mapper?

In [25]:
for niigz in data_dir.glob("**/*.nii.gz"):
    try:
        vol, data, hdr = read_data(niigz)
        # Do stuff
        data = preprocess_data(data)
        # Do stuff
        data = process_data(data)
        # Do stuff
        display_data(data,name=str(niigz.stem))
        # Do stuff
        data = postprocess_data(data)
        # Do stuff
        store_data(data)
    except:
        pass
    ...

MRI slices in x of sub-01_ses-T1_T1w.nii¶

MRI slices in y of sub-01_ses-T1_T1w.nii¶

MRI slices in z of sub-01_ses-T1_T1w.nii¶

MRI slices in x of sub-01_ses-T1_task-SSD_bold.nii¶

MRI slices in y of sub-01_ses-T1_task-SSD_bold.nii¶

MRI slices in z of sub-01_ses-T1_task-SSD_bold.nii¶

MRI slices in x of sub-01_ses-T1_task-SLI_bold.nii¶

MRI slices in y of sub-01_ses-T1_task-SLI_bold.nii¶

MRI slices in z of sub-01_ses-T1_task-SLI_bold.nii¶

MRI slices in x of sub-01_ses-T1_task-SLD_bold.nii¶

MRI slices in y of sub-01_ses-T1_task-SLD_bold.nii¶

MRI slices in z of sub-01_ses-T1_task-SLD_bold.nii¶

MRI slices in x of sub-01_ses-T1_task-SSI_bold.nii¶

MRI slices in y of sub-01_ses-T1_task-SSI_bold.nii¶

MRI slices in z of sub-01_ses-T1_task-SSI_bold.nii¶


Extra info¶

  • Ask James - James gave a bunch of data sources and ideas on canvas. For "easy" data sets he suggests:
    • ds004958
    • ds004934
    • ds004902
  • While I was making this I HEAVILY referenced Neural Data Science in Python you should for sure look at this. Particularly interesting for the mapper use case is this discussion on masking images.
  • OpenCV
  • Pillow
  • zen-mapper